A new day, a new plot, a new geom. The goal of this exercise is to learn about a new type of plot (ridgeline plot) and to learn how to make it. Use the geom_density_ridges() function from the ggridges package to make a ridge plot of Airbnb review scores of Edinburgh neighborhoods. The neighborhoods should be ordered by their median review scores. The data can be found in the dsbox package, and it’s called edibnb. Also include an interpretation for your visualization. You should review feedback from your Homework 1 to make sure you capture anything you may have missed previously.
Data Analysis - Q1
Table 1. Diagnostic Summary for review_scores_rating (edibnb data set)
Metric
Value
Data Type
numeric
Min
20
1st Quartile
93
Median
97
Mean
95.0246657029274
3rd Quartile
99
Max
100
Missing Values
2177
IQR
6
Lower Outlier Bound
84
Upper Outlier Bound
108
Outlier Count
576
Code
# =========================================# Visualization: Ridge Plot by Neighborhood# =========================================# - Filter out rows with NA in neighbourhood or review_scores_ratingedibnb_clean <- edibnb |>filter(!is.na(neighbourhood), !is.na(review_scores_rating))# - Calculate median review scores by neighborhoodmedian_scores <- edibnb_clean |>group_by(neighbourhood) |>summarize(median_score =median(review_scores_rating, na.rm =TRUE)) |>arrange(median_score)# - Reorder neighborhoods by median scoreedibnb_clean <- edibnb_clean |>mutate(neighbourhood =factor(neighbourhood, levels = median_scores$neighbourhood))# ==== New: Calculate mean review scores for each neighborhood (for annotation) ====mean_scores <- edibnb_clean |>group_by(neighbourhood) |>summarize(mean_score =mean(review_scores_rating, na.rm =TRUE))# ==== Modified plot: add geom_text for mean scores offset to the right ====# - Get the first and last neighborhood (lowest & highest median)low_neigh <-levels(edibnb_clean$neighbourhood)[1]high_neigh <-levels(edibnb_clean$neighbourhood)[length(levels(edibnb_clean$neighbourhood))]# - Maximum x-position for annotationmax_annot_x <-max(mean_scores$mean_score, na.rm =TRUE) +25# Create a named vector of colors for neighborhoods - color for the envelopes.neighborhood_colors <-setNames(viridis::viridis(length(levels(edibnb_clean$neighbourhood))), levels(edibnb_clean$neighbourhood))#===============# Make the plot#===============g1 <-ggplot(edibnb_clean, aes(x = review_scores_rating, y = neighbourhood)) +geom_density_ridges( # - use geom_density_ridges aes(color = neighbourhood),fill ="cornsilk4", # - fill in the area under the curvealpha =0.7,scale =2.0, # - this makes the ridges 'larger' on the plotted area.bandwidth =1, # you can adjust this value as neededshow.legend =FALSE ) +geom_point(data = mean_scores,aes(x = mean_score, y = neighbourhood, color ="Mean Score"),size =3,shape =18 ) +geom_segment(data = mean_scores,aes(x = mean_score,xend = mean_score +20,y = neighbourhood,yend =as.numeric(neighbourhood) +0.5 ),color ="black",linewidth =0.3 ) + ggrepel::geom_text_repel(data = mean_scores,aes(x = mean_score +20,y =as.numeric(neighbourhood) +0.5,label =round(mean_score, 1) ),hjust =0,size =3.2,color ="black",nudge_y =0.15,segment.color =NA,direction ="y",box.padding =0.3,point.padding =0.5,show.legend =FALSE ) +annotate( # - put an annotation next to the 'low' value"text",x = max_annot_x,y =1.5,label ="Low",hjust =-1,size =5,color = neighborhood_colors[[low_neigh]] ) +annotate( # - put an annotation next to the 'high' value"text",x = max_annot_x,y =length(levels(edibnb_clean$neighbourhood)) +0.5,label ="High",hjust =-1,size =5,color = neighborhood_colors[[high_neigh]] ) +scale_x_continuous(name ="Review Scores Rating", limits =c(0, 150)) +scale_color_manual( # - adds color to the curve envelopes..name ="Legend",values =c(neighborhood_colors, "Mean Score"="black"),breaks ="Mean Score",labels ="Mean Score" ) +labs(title ="Distribution of Airbnb Review Scores by Edinburgh Neighborhood\n", y ="Neighborhood",caption ="Source: Inside Airbnb via Kaggle (data from June 2019)") +theme_ridges() +theme(legend.position ="right",legend.title =element_text(hjust =0.6),axis.title.y =element_text(vjust =1.5, hjust =0.5),plot.title =element_text(hjust =0.5),axis.title.x =element_text(hjust =0.5),plot.caption =element_text( # - stylizing the captionhjust =0.5, # right aligncolor ="grey50", # lighter greysize =8 ) ) # - show the graphg1
Interpretation The graph (Distribution of Airbnb Review Scores by Edinburgh Neighborhood) displays the distribution of Airbnb review scores across Edinburgh neighborhoods using ridgeline plots, with each neighborhood’s mean score marked by a diamond (via double encoding). The mean review scores are generally high, ranging from about 93.9 to 95.9, on a scale of 0-100. Some neighborhoods, like Morningside and Bruntsfield, show slightly higher average scores. The variation in score spread highlights differences in review consistency between neighborhoods, making it easier to compare where listings tend to receive better feedback.
2 - Foreign Connected PACs
Question #2a
Make a graph: Contributions to US political parties from UK-connected PACs.
Data Analysis - Q2
Table 1. Diagnostic Summary for dems and repubs
Variable
Metric
Value
dems
Data Type
numeric
dems
Min
-9050
dems
1st Quartile
2000
dems
Median
11000
dems
Mean
35667.7589807853
dems
3rd Quartile
40000
dems
Max
853223
dems
Missing Values
0
dems
IQR
38000
dems
Lower Outlier Bound
-55000
dems
Upper Outlier Bound
97000
dems
Outlier Count
245
repubs
Data Type
numeric
repubs
Min
-11000
repubs
1st Quartile
3000
repubs
Median
18500
repubs
Mean
50162.7623224729
repubs
3rd Quartile
58000
repubs
Max
812500
repubs
Missing Values
0
repubs
IQR
55000
repubs
Lower Outlier Bound
-79500
repubs
Upper Outlier Bound
140500
repubs
Outlier Count
236
Code
# ====================================# Data Wrangle - Pivot and Summarize# ===================================# - Pivot data to longer format: create 'party' and 'amount' columnspac_long <- pac |>pivot_longer(cols =c(dems, repubs),names_to ="party",values_to ="amount" ) |># - Replace party values with readable labelsmutate(party =recode(party,"dems"="Democrat","repubs"="Republican"))# - Filter for UK-origin PACs and remove NA values in amountyearly_totals <- pac_long |>filter(country =="UK", !is.na(amount)) |># - Summarize total contributions by year and partygroup_by(year, party) |>summarise(amount =sum(amount), .groups ="drop")# ==================# Create the Plot# ==================g2a <-ggplot(yearly_totals, aes(x = year, y = amount, color = party)) +geom_line(linewidth =1.2) +# Keep line onlyscale_y_continuous(name ="Total amount",labels =label_dollar(scale =1e-6, suffix ="M") ) +scale_x_continuous(name ="Year",breaks = scales::pretty_breaks() ) +scale_color_manual(values =c("Democrat"="blue", "Republican"="red"),labels =c("Democrat", "Republican"),name ="Party" ) +labs(title ="Contributions to US political parties from UK-connected PACs",caption ="Source: OpenSecrets.org" ) +theme(legend.position.inside =c(0.9, 0.15), # - This may cause clipping warningsaxis.title.x =element_text(hjust =0), # - Align X-axis label leftaxis.title.y =element_text(hjust =0), # - Align Y-axis label leftplot.caption =element_text(hjust =1, size =12) )# - Show the plotg2a
Question #2b
Make a graph: Contributions to US political parties from non-UK-connected PACs. Let’s pick Switzerland.
Code
# ==================================# Data Wrangle - Pivot and Summarize# ==================================pac_long <- pac |>pivot_longer(cols =c(dems, repubs),names_to ="party",values_to ="amount" ) |>mutate(party =recode(party,"dems"="Democrat","repubs"="Republican"))yearly_totals <- pac_long |>filter(country =="Switzerland", !is.na(amount)) |>group_by(year, party) |>summarise(amount =sum(amount), .groups ="drop")# ==================# Create the Plot# ==================g2a <-ggplot(yearly_totals, aes(x = year, y = amount, color = party)) +geom_line(linewidth =1.2) +scale_y_continuous(name ="Total amount",labels =label_dollar(scale =1e-6, suffix ="M") ) +scale_x_continuous(name ="Year",breaks = scales::pretty_breaks() ) +scale_color_manual(values =c("Democrat"="blue", "Republican"="red"),labels =c("Democrat", "Republican"),name ="Party" ) +labs(title ="Contributions to US political parties from Switzerland-connected PACs") +theme(legend.position.inside =c(0.9, 0.15),axis.title.x =element_text(hjust =0),axis.title.y =element_text(hjust =0) )# - Show the plot# print(g2a)g2a
3 - Median housing prices in the US
Question #3a
Re-create the graph: Median Housing Prices in the US - not seasonally adjusted
Data Analysis - Q3
Combined Diagnostic Summary for Median Housing and Recession Data
Dataset
Metric
Value
median_housing
date - Data Type
Date
median_housing
price - Data Type
numeric
median_housing
date - Missing Values
0
median_housing
price - Missing Values
0
median_housing
price - Min
17800
median_housing
price - 1st Quartile
49575
median_housing
price - Median
124350
median_housing
price - Mean
140386.752136752
median_housing
price - 3rd Quartile
223350
median_housing
price - Max
374900
median_housing
date - Range Start
1963-01-01
median_housing
date - Range End
2021-04-01
recessions
start - Data Type
Date
recessions
end - Data Type
Date
recessions
start - Missing Values
0
recessions
end - Missing Values
0
recessions
start - Range Start
1857-06-01
recessions
start - Range End
2020-02-01
recessions
end - Range Start
1858-12-01
recessions
end - Range End
2020-04-01
Code
# Load the median housing data# median_housing <- read_csv("data/median-housing.csv") |># rename(date = DATE, price = MSPUS) |># mutate(date = ymd(date)) ## # # - Load the recession data# recessions <- read_csv("data/recessions.csv") |># rename(start = Peak, end = Trough) |># mutate(start = ymd(start), end = ymd(end)) # mutate to y-m-d format# Custom y-axis breaks (extend past 400,000 to ensure it shows)y_breaks <-seq(0, 440000, by =40000)# ADDED: Custom x-axis breaks (exclude endpoints)x_breaks <-seq(1965, 2020, by =5) |>paste0("-01-01") |>ymd()# Create the visualizationg3a <-ggplot(median_housing, aes(x = date, y = price)) +geom_line(color ="#4d72e3") +# - REMOVED: Recession shading (geom_rect layer)# - REPLACED: Restrict x-axis and apply custom breaksscale_x_date(breaks = x_breaks,limits =c(ymd("1963-01-01"), ymd("2021-04-01")), # x-limitsdate_labels ="%Y"# format to '2000' ) +# Custom y-axis breaks and formatted labelsscale_y_continuous(breaks = y_breaks, labels =comma_format(),limits =c(0, 400000), # y-limitsexpand =c(0, 0) ) +# Axis labelslabs(title ="Median sales price of houses sold in the United States\nNot seasonally adjusted",y ="Dollars",x =NULL,# - Source annotationcaption ="Sources: Census; HUD" ) +theme_minimal() +# - Right-align the captiontheme(plot.title =element_text(hjust =0,margin =margin(b=15,l=-50,r=0) # - control margins of plot title. ), # Align leftplot.caption =element_text(hjust =1, size =9, face ="italic"),plot.margin =margin(t =5, r =0, b =10, l =30), # Increase left margin to push plot body rightpanel.grid.major.x =element_blank(), # Remove major vertical grid linespanel.grid.minor.x =element_blank(), # Remove minor vertical grid linespanel.grid.major.y =element_line(), # Keep major horizontal grid linespanel.grid.minor.y =element_blank() # Remove minor horizontal grid lines )plot(g3a)
Question #3b
• Identify recessions that happened during the time frame of the median_housing dataset. Do this by adding a new variable to recessions that takes the value TRUE if the recession happened during this time frame and FALSE if not. • Now recreate the following visualization. The shaded areas are recessions that happened during the time frame of the median_housing dataset. Hint: The shaded areas are “behind” the line.
Code
# Load the median housing data - assumed already loaded elsewhere# median_housing <- read_csv("data/median-housing.csv") |> # rename(date = DATE, price = MSPUS) |> # mutate(date = ymd(date)) # Load the recession datarecessions <-read_csv("data/recessions.csv") |>rename(start = Peak, end = Trough) |>mutate(start =ymd(start), end =ymd(end)) # Custom y-axis breaks (extend past 400,000 to ensure it shows)y_breaks <-seq(0, 440000, by =40000)# Custom x-axis breaks (exclude endpoints)x_breaks <-seq(1965, 2020, by =5) |>paste0("-01-01") |>ymd()# Create the visualizationg3b <-ggplot(median_housing, aes(x = date, y = price)) +geom_line(color ="#4d72e3") +# Recession shadinggeom_rect(data = recessions,aes(xmin = start, xmax = end, ymin =-Inf, ymax =Inf),inherit.aes =FALSE,fill ="gray",alpha =0.3 ) +# Restrict x-axis and apply custom breaksscale_x_date(breaks = x_breaks,limits =c(ymd("1961-01-01"), ymd("2025-01-01")),date_labels ="%Y" ) +# Custom y-axis breaks and formatted labelsscale_y_continuous(breaks = y_breaks,labels =comma_format(),limits =c(0, 400000),expand =c(0, 0) ) +# Axis labelslabs(title ="Median sales price of houses sold in the United States\nNot seasonally adjusted",y ="Dollars",x ="Shaded areas indicate U.S. recessions",caption ="Source: Census; HUD" ) +theme_minimal() +# Customize theme for captions, margins, and gridstheme(plot.caption =element_text(hjust =1, size =9, face ="italic"),plot.title =element_text(hjust =0,margin =margin(b =15, l =-50, r =0) ),plot.margin =margin(t =5, r =0, b =10, l =30),axis.title.x =element_text(hjust =1, # right-align the x-axis labelmargin =margin(t =10) # optional: add some space above the label ),# Grid linespanel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank(),panel.grid.major.y =element_line(),panel.grid.minor.y =element_blank() )# Print plot without warningssuppressWarnings(print(g3b))
3b-Note: Some recession rows were intentionally excluded for the purpose of the assignment.
Question #3c
• Create a subset of median_housing dataset from 2019 and 2020. Add two columns: year and quarter. year is the year of the date and the quarter takes the values of Q1, Q2, Q3, or Q4 based on date • Re-create the visualization.
Code
# - Load the data (already loadedNo)# median_housing <- read_csv("data/median-housing.csv") |># rename(date = DATE, price = MSPUS) |># mutate(date = ymd(date))# - Filter to 2019 and 2020 only, keeping quarterly pointsquarter_dates <-seq(ymd("2019-01-01"), ymd("2020-10-01"), by ="3 months")median_housing_q <- median_housing |>filter(date %in% quarter_dates)# - Define quarter labelsquarter_labels <-rep(c("Q1", "Q2", "Q3", "Q4"), times =2)# - Define year sub-labels (x-axis) - best I could figure out how to doyear_labels <-c("", sprintf("%25s", "2019"), "", "","", sprintf("%25s", "2020"), "", "")# - Combine into single x-axis label with optional sub-labelscombined_labels <-ifelse( year_labels =="", quarter_labels,paste0(quarter_labels, "\n", year_labels))#===============# - Plot data#===============g3c <-ggplot(median_housing_q, aes(x = date, y = price)) +# --- Change 1 & 2: Thicken the line and make sure it does not run through points ---geom_line(color ="#4d72e3", linewidth =1, lineend ="round") +# Slightly thicker line# --- Change 1: Make circles look 'empty' by drawing points with stroke and fill white ---geom_point(shape =21, size =2, color ="#4d72e3", fill ="white", stroke =1.2) +# Empty circles# --- Change 3: Limit line to end exactly at Q4 by limiting x-axis ---scale_x_date(breaks = quarter_dates,labels = combined_labels,limits =c(ymd("2019-01-01"), ymd("2020-10-01")), # Already correct rangeexpand =c(0.02, 0.02) # <<< CHANGE: Prevent extension beyond last Q4 tick) +scale_y_continuous(breaks =seq(300000, 360000, by =20000),labels =comma_format(),limits =c(300000, 360000),expand =c(0, 0) ) +labs(title ="Median sales price of houses sold in the United States\nNot seasonally adjusted",x =NULL,y ="Dollars" ) +theme_minimal() +theme(plot.title =element_text(hjust =0,margin =margin(b=15,l=-50,r=0) # Control margins of plot title. ), # Align leftplot.caption =element_text(hjust =1, size =9, face ="italic"),plot.margin =margin(t =5, r =0, b =40, l =30), # Increase left margin to push plot body right#panel.grid.major.x = element_blank(), # Remove major vertical grid linespanel.grid.minor.x =element_blank(), # Remove minor vertical grid linespanel.grid.major.y =element_line(), # Keep major horizontal grid lines#panel.grid.minor.y = element_blank(), # Remove minor horizontal grid lines ) +# --- CHANGE: Use annotation_custom to add year labels below x-axis ---# Add year labels between Q2 and Q3 using annotate()annotate("text", x =ymd("2019-05-16"), y =298000, label ="2019", size =3.5) +annotate("text", x =ymd("2020-05-16"), y =298000, label ="2020", size =3.5)# - PlotsuppressWarnings(print(g3c))
3c-Note: Some recession rows were intentionally excluded for the purpose of the assignment.
4 - Expect More. Plot More.
Question #4
Recreate the Target LOGO. Describe steps.. (see code comments) 1. make inner dot 2. Make outer ring 3. Make ‘Target’ use ‘[R]’ symbol 4. Piece it all together.
Code
# library(ggplot2)# library(ggforce)# Define Target's signature red colortarget_red <-"#e82118"# Create a data frame for the concentric circles (outer red ring and inner white ring)circles <-data.frame(x0 =0, y0 =0, # Center of both circlesr =c(1, 0.7), # Radii for the outer red ring and inner white ringfill =c(target_red, "white") # Fill colors for each ring)# Construct the plotg1 <-ggplot() +# Draw the red and white rings (two concentric circles)geom_circle(data = circles, aes(x0 = x0, y0 = y0, r = r, fill = fill), color =NA) +# Draw the solid inner red circle (target bullseye center)geom_circle(aes(x0 =0, y0 =0, r =0.3), fill = target_red, color =NA) +# Add the text "TARGET" beneath the bullseyeannotate("text", x =0, y =-1.4, label ="TARGET", color = target_red, size =10, fontface ="bold") +# Add the registered trademark symbol next to "TARGET"annotate("text", x =0.6, y =-1.52, label ="®", color = target_red, size =8) +# Use fill colors as provided (don't map them to a color scale)scale_fill_identity() +# Ensure equal aspect ratio for x and y (perfect circles)coord_fixed() +# Remove all background, axes, and gridlinestheme_void()# Render the plotplot(g1)
5 - Mirror, mirror on the wall, who’s the ugliest of them all?
Question #5
Mirror, mirror on the wall, who’s the ugliest of them all? Make a plot of the variables in the penguins dataset from the palmerpenguins package. Your plot should use at least two variables, but more is fine too. First, make the plot using the default theme and color scales. Then, update the plot to be as ugly as possible. You will probably want to play around with theme options, colors, fonts, etc. The ultimate goal is the ugliest possible plot, and the sky is the limit!
Code
# Load necessary libraries# Remove rows with missing valuespenguins_clean <-na.omit(penguins)# Create scatter plot: bill length vs flipper length, colored by speciesg5a <-ggplot(data = penguins_clean, aes(x = bill_length_mm, y = flipper_length_mm, color = species)) +geom_point() +labs(title ="Plot #1 - normal plot\nBill Length vs Flipper Length by Species",x ="Bill Length (mm)",y ="Flipper Length (mm)" )# - show itg5a
Question #5
The ultimate goal is the ugliest possible plot, and the sky is the limit! Snakes on a plane? No. Penguins on a Sphere!
Code
# Step 1: Clean the datapenguins_clean <-na.omit(penguins)# Step 2: Map to spherical coordinates# - θ (theta): polar angle (bill length) mapped to [0, π]# - φ (phi): azimuthal angle (flipper length) mapped to [0, 2π]penguins_sphere <- penguins_clean %>%mutate(theta =rescale(bill_length_mm, to =c(0, pi)),phi =rescale(flipper_length_mm, to =c(0, 2* pi)),r =1,x = r *sin(theta) *cos(phi),y = r *sin(theta) *sin(phi),z = r *cos(theta) )# Step 3: Build the 3D scatter plotplot_3d <-plot_ly(data = penguins_sphere,x =~x, y =~y, z =~z,type ='scatter3d',mode ='markers',color =~species,colors =c("Adelie"="red", "Chinstrap"="green", "Gentoo"="blue"),marker =list(size =4)) %>%layout(title =list(text ="Plot #2 – Penguins on a Sphere",font =list(size =25) ),margin =list(t =100, # ← increase top margin herer =30 ),scene =list(xaxis =list(title ="X"),yaxis =list(title ="Y"),zaxis =list(title ="Z") ),legend =list(title =list(text ="<b>Species</b>")) )# Step 4: Add description as an HTML caption below the plottagList( plot_3d,HTML(" <div style='margin-top: 1em; font-size: 14px; line-height: 1.4em;'> <strong>Mapping Description:</strong><br> • <b>Bill Length</b> → θ (polar angle, latitude)<br> • <b>Flipper Length</b> → φ (azimuthal angle, longitude)<br> • Radius is constant: <b>r = 1</b><br> • Penguins are plotted on the <b>surface of a unit sphere</b> </div> "))
Mapping Description:
• Bill Length → θ (polar angle, latitude)
• Flipper Length → φ (azimuthal angle, longitude)
• Radius is constant: r = 1
• Penguins are plotted on the surface of a unit sphere
Source Code
---title: "HW 02"author: "Nathan Herling"date: "2025-06-13"_due: "Friday-June-13-2025" format: html: embed-resources: true code-fold: true # enables collapsible code blocks code-tools: true # optionally adds buttons to show/hide all toc: true#custom field '_due', #if you want to show in the document:**Due Date:** {{_due}}---## 0 - Setup```{r setup, echo=FALSE}#--------------------->################## Package Setup ###################Check if pacman [package manager] is installed, if not install it.#throw [FYI] alert either way.if (!requireNamespace("pacman", quietly = TRUE)) { message("Installing 'pacman' (not found locally)...") install.packages("pacman")} else { message("[FYI]\n'pacman' already installed — skipping install.")}# use this line for installing/loading# pacman::p_load()# - packages to load stored in a variable (vector)pkgs <- c( "tidyverse", "glue", "scales", "lubridate", "patchwork", "ggh4x", "ggrepel", "openintro", "ggridges", "dsbox", "janitor", "here", "knitr", "ggthemes", "ggplot2", "kableExtra", "palmerpenguins", "grid", "htmltools", "plotly", "ggforce")# - load from the character array/vectorpacman::p_load(char=pkgs)# - install tidyverse/dsbox directly from Git Hub# - this allows for the possible need to install on a repo. pull.# - and, if it's already installed just thorw an alert.if (!requireNamespace("dsbox", quietly = TRUE)) { message("Installing 'dsbox' from GitHub (not found locally)...") suppressMessages(devtools::install_github("tidyverse/dsbox"))} else { message("[FYI]\n'dsbox' already installed — skipping GitHub install.")}# - alert to user packages loaded.# Set number of columns (adjustable)n_cols <- 4# Add * to each package namepkgs <- paste0("* ", pkgs)# Calculate number of rows based on total packagesn_rows <- ceiling(length(pkgs) / n_cols)# Pad with empty strings to complete gridpkgs_padded <- c(pkgs, rep("", n_rows * n_cols - length(pkgs)))# Create matrix (fill by row)pkg_matrix <- matrix(pkgs_padded, nrow = n_rows, byrow = TRUE)# Print headercat("The packages loaded:")# Loop and print each row (use invisible to suppress NULL)invisible(apply(pkg_matrix, 1, function(row) { cat(paste(format(row, width = 22), collapse = ""), "\n")}))#-------------------------->####################### Basic set Theme up ######################## ---- set theme for ggplot2ggplot2::theme_set(ggplot2::theme_minimal(base_size = 14))# set width of code outputoptions(width = 65)# set figure parameters for knitrknitr::opts_chunk$set( fig.width = 7, # 7" width fig.asp = 0.618, # the golden ratio fig.retina = 3, # dpi multiplier for displaying HTML output on retina fig.align = "center", # center align figures dpi = 300 # higher dpi, sharper image)## ---- end theme set up```## 1 - A new day, a new plot, a new geom> **Question #1**> <div style="background-color:#e0e0e0; margin:15px; padding:10px; border-radius:5px; border:1px solid #16DB65; overflow-wrap:anywhere; word-break:break-word; white-space:normal;">> A new day, a new plot, a new geom. The goal of this exercise is to learn about a new type of plot (ridgeline plot) and to learn how to make it. Use the > ```geom_density_ridges()``` function from the ```ggridges``` package to make a ridge plot of Airbnb review scores of Edinburgh neighborhoods. The neighborhoods should be ordered by their median review scores. The data can be found in the ```dsbox``` package, and it’s called ```edibnb```. Also include an interpretation for your visualization. You should review feedback from your Homework 1 to make sure you capture anything you may have missed previously.> </div><details><summary>Data Analysis - Q1</summary>```{r}#| label: label-me-1_data_table#| echo: false#| fig-asp: 0.63#| fig-width: 7#---------------------------------------># - A way to # 1. get my hw done# 2. supress warnings# 3. understand the contents of my data.# =======================================# Diagnostics Summary Table Preparation# =======================================#---------------------------------------># Data Typedata_type <-class(edibnb$review_scores_rating)# - info on the DSet#print(help("edibnb"))# - Summary Statssummary_stats <-summary(edibnb$review_scores_rating)# - Missing valuesmissing_count <-sum(is.na(edibnb$review_scores_rating))# - IQR and Outlier Detectionq1 <-quantile(edibnb$review_scores_rating, 0.25, na.rm =TRUE)q3 <-quantile(edibnb$review_scores_rating, 0.75, na.rm =TRUE)iqr <- q3 - q1lower_bound <- q1 -1.5* iqrupper_bound <- q3 +1.5* iqroutliers <- edibnb$review_scores_rating[!is.na(edibnb$review_scores_rating) & (edibnb$review_scores_rating < lower_bound | edibnb$review_scores_rating > upper_bound)]# - Construct a table of key diagnosticsdiagnostics_table <- tibble::tibble(Metric =c("Data Type","Min", "1st Quartile", "Median", "Mean", "3rd Quartile", "Max","Missing Values","IQR", "Lower Outlier Bound", "Upper Outlier Bound", "Outlier Count" ),Value =c( data_type, summary_stats[["Min."]], summary_stats[["1st Qu."]], summary_stats[["Median"]], summary_stats[["Mean"]], summary_stats[["3rd Qu."]], summary_stats[["Max."]], missing_count, iqr, lower_bound, upper_bound,length(outliers) ))# - Print the diagnostics table with styled caption and highlighted rowdiagnostics_table |>kable(caption ="<span style='font-weight:bold; font-size:1.2em;'>Table 1. Diagnostic Summary for <code>review_scores_rating</code> (edibnb data set)</span>",escape =FALSE ) |>kable_styling(full_width =FALSE, bootstrap_options =c("striped", "hover")) |>row_spec(which(diagnostics_table$Metric =="Missing Values"), #-Highlight "Missing Values" rowbackground ="#fff9c4"# - Light yellow )```</details>```{r}#| label: label-me-1#| fig-asp: 0.63#| fig-width: 7#| code-fold: true# =========================================# Visualization: Ridge Plot by Neighborhood# =========================================# - Filter out rows with NA in neighbourhood or review_scores_ratingedibnb_clean <- edibnb |>filter(!is.na(neighbourhood), !is.na(review_scores_rating))# - Calculate median review scores by neighborhoodmedian_scores <- edibnb_clean |>group_by(neighbourhood) |>summarize(median_score =median(review_scores_rating, na.rm =TRUE)) |>arrange(median_score)# - Reorder neighborhoods by median scoreedibnb_clean <- edibnb_clean |>mutate(neighbourhood =factor(neighbourhood, levels = median_scores$neighbourhood))# ==== New: Calculate mean review scores for each neighborhood (for annotation) ====mean_scores <- edibnb_clean |>group_by(neighbourhood) |>summarize(mean_score =mean(review_scores_rating, na.rm =TRUE))# ==== Modified plot: add geom_text for mean scores offset to the right ====# - Get the first and last neighborhood (lowest & highest median)low_neigh <-levels(edibnb_clean$neighbourhood)[1]high_neigh <-levels(edibnb_clean$neighbourhood)[length(levels(edibnb_clean$neighbourhood))]# - Maximum x-position for annotationmax_annot_x <-max(mean_scores$mean_score, na.rm =TRUE) +25# Create a named vector of colors for neighborhoods - color for the envelopes.neighborhood_colors <-setNames(viridis::viridis(length(levels(edibnb_clean$neighbourhood))), levels(edibnb_clean$neighbourhood))#===============# Make the plot#===============g1 <-ggplot(edibnb_clean, aes(x = review_scores_rating, y = neighbourhood)) +geom_density_ridges( # - use geom_density_ridges aes(color = neighbourhood),fill ="cornsilk4", # - fill in the area under the curvealpha =0.7,scale =2.0, # - this makes the ridges 'larger' on the plotted area.bandwidth =1, # you can adjust this value as neededshow.legend =FALSE ) +geom_point(data = mean_scores,aes(x = mean_score, y = neighbourhood, color ="Mean Score"),size =3,shape =18 ) +geom_segment(data = mean_scores,aes(x = mean_score,xend = mean_score +20,y = neighbourhood,yend =as.numeric(neighbourhood) +0.5 ),color ="black",linewidth =0.3 ) + ggrepel::geom_text_repel(data = mean_scores,aes(x = mean_score +20,y =as.numeric(neighbourhood) +0.5,label =round(mean_score, 1) ),hjust =0,size =3.2,color ="black",nudge_y =0.15,segment.color =NA,direction ="y",box.padding =0.3,point.padding =0.5,show.legend =FALSE ) +annotate( # - put an annotation next to the 'low' value"text",x = max_annot_x,y =1.5,label ="Low",hjust =-1,size =5,color = neighborhood_colors[[low_neigh]] ) +annotate( # - put an annotation next to the 'high' value"text",x = max_annot_x,y =length(levels(edibnb_clean$neighbourhood)) +0.5,label ="High",hjust =-1,size =5,color = neighborhood_colors[[high_neigh]] ) +scale_x_continuous(name ="Review Scores Rating", limits =c(0, 150)) +scale_color_manual( # - adds color to the curve envelopes..name ="Legend",values =c(neighborhood_colors, "Mean Score"="black"),breaks ="Mean Score",labels ="Mean Score" ) +labs(title ="Distribution of Airbnb Review Scores by Edinburgh Neighborhood\n", y ="Neighborhood",caption ="Source: Inside Airbnb via Kaggle (data from June 2019)") +theme_ridges() +theme(legend.position ="right",legend.title =element_text(hjust =0.6),axis.title.y =element_text(vjust =1.5, hjust =0.5),plot.title =element_text(hjust =0.5),axis.title.x =element_text(hjust =0.5),plot.caption =element_text( # - stylizing the captionhjust =0.5, # right aligncolor ="grey50", # lighter greysize =8 ) ) # - show the graphg1```<div style="background-color:#e0e0e0; margin:15px; padding:10px; border-radius:5px; border:1px solid #16DB65; display:flex; align-items:center; justify-content:center; height:auto;"> <strong><br>Interpretation</strong> The graph (`Distribution of Airbnb Review Scores by Edinburgh Neighborhood`) displays the distribution of Airbnb review scores across Edinburgh neighborhoods using ridgeline plots, with each neighborhood’s mean score marked by a diamond (via double encoding). The mean review scores are generally high, ranging from about 93.9 to 95.9, on a scale of 0-100. Some neighborhoods, like Morningside and Bruntsfield, show slightly higher average scores. The variation in score spread highlights differences in review consistency between neighborhoods, making it easier to compare where listings tend to receive better feedback.</div>## 2 - Foreign Connected PACs<div style="background-color:#f0f0f0; margin:15px; padding:10px; border-radius:5px; border:1px solid #87CEEB;"><b>Question #2a</b><br><div style="background-color:#e0e0e0; padding:2px; border-radius:5px; border:1px solid #16DB65; display:flex; align-items:center; justify-content:center; height:auto;"> <strong><br>Make a graph: Contributions to US political parties from UK-connected PACs.</strong></div></div><details><summary>Data Analysis - Q2</summary>```{r}#| label: label-me-2_table#| echo: false#| fig-width: 10# Suppress readr column type messagesoptions(readr.show_col_types =FALSE) # ==================# Load and combine data# ==================list_of_files <- fs::dir_ls(path ="data", regexp ="Foreign Connected PAC")# - Read and clean datalist_of_files <- fs::dir_ls(path ="data", regexp ="Foreign Connected PAC")# - use janitor package clean_names()pac <-read_csv(list_of_files, id ="year") |>clean_names() |>mutate(year =str_extract(year, "\\d{4}(?=\\.csv$)") |>as.integer()) |>separate(country_of_origin_parent_company,into =c("country", "parent_company"),sep ="/", fill ="right", extra ="merge") |>mutate(across(c(dems, repubs), ~as.numeric(str_replace_all(., "[\\$,]", "")))) |>select(year, pac_name_affiliate, country, parent_company, dems, repubs)# - Function to create diagnostic summarygenerate_diagnostics_table <-function(vec, var_name) { summary_stats <-summary(vec) missing_count <-sum(is.na(vec)) q1 <-quantile(vec, 0.25, na.rm =TRUE) q3 <-quantile(vec, 0.75, na.rm =TRUE) iqr <- q3 - q1 lower_bound <- q1 -1.5* iqr upper_bound <- q3 +1.5* iqr outliers <- vec[!is.na(vec) & (vec < lower_bound | vec > upper_bound)] tibble::tibble(Variable = var_name,Metric =c("Data Type", "Min", "1st Quartile", "Median", "Mean", "3rd Quartile", "Max","Missing Values", "IQR", "Lower Outlier Bound", "Upper Outlier Bound", "Outlier Count" ),Value =c(class(vec), summary_stats[["Min."]], summary_stats[["1st Qu."]], summary_stats[["Median"]], summary_stats[["Mean"]], summary_stats[["3rd Qu."]], summary_stats[["Max."]], missing_count, iqr, lower_bound, upper_bound,length(outliers) ) )}# - Combine both diagnosticscombined_diag <-bind_rows(generate_diagnostics_table(pac$dems, "dems"),generate_diagnostics_table(pac$repubs, "repubs"))# - Render the tablecombined_diag |>kable(caption ="<span style='font-weight:bold; font-size:1.2em;'>Table 1. Diagnostic Summary for <code>dems</code> and <code>repubs</code></span>",escape =FALSE ) |>kable_styling(full_width =FALSE, bootstrap_options =c("striped", "hover")) |>row_spec(which(combined_diag$Metric =="Missing Values"), background ="#fff9c4")```</details>```{r}#| label: label-me-2a#| code-fold: true#| fig-width: 10# ====================================# Data Wrangle - Pivot and Summarize# ===================================# - Pivot data to longer format: create 'party' and 'amount' columnspac_long <- pac |>pivot_longer(cols =c(dems, repubs),names_to ="party",values_to ="amount" ) |># - Replace party values with readable labelsmutate(party =recode(party,"dems"="Democrat","repubs"="Republican"))# - Filter for UK-origin PACs and remove NA values in amountyearly_totals <- pac_long |>filter(country =="UK", !is.na(amount)) |># - Summarize total contributions by year and partygroup_by(year, party) |>summarise(amount =sum(amount), .groups ="drop")# ==================# Create the Plot# ==================g2a <-ggplot(yearly_totals, aes(x = year, y = amount, color = party)) +geom_line(linewidth =1.2) +# Keep line onlyscale_y_continuous(name ="Total amount",labels =label_dollar(scale =1e-6, suffix ="M") ) +scale_x_continuous(name ="Year",breaks = scales::pretty_breaks() ) +scale_color_manual(values =c("Democrat"="blue", "Republican"="red"),labels =c("Democrat", "Republican"),name ="Party" ) +labs(title ="Contributions to US political parties from UK-connected PACs",caption ="Source: OpenSecrets.org" ) +theme(legend.position.inside =c(0.9, 0.15), # - This may cause clipping warningsaxis.title.x =element_text(hjust =0), # - Align X-axis label leftaxis.title.y =element_text(hjust =0), # - Align Y-axis label leftplot.caption =element_text(hjust =1, size =12) )# - Show the plotg2a```<div style="background-color:#f0f0f0; padding:10px; border-radius:5px; border:1px solid #87CEEB; margin:20px"><b>Question #2b</b><br><div style="background-color:#e0e0e0; padding:10px; border-radius:5px; border:1px solid #16DB65; display:flex; align-items:center; justify-content:center; height:auto;"> <strong><br>Make a graph: Contributions to US political parties from non-UK-connected PACs. <br>Let's pick Switzerland.</strong></div></div>```{r}#| label: label-me-2b#| code-fold: true#| fig-width: 10# ==================================# Data Wrangle - Pivot and Summarize# ==================================pac_long <- pac |>pivot_longer(cols =c(dems, repubs),names_to ="party",values_to ="amount" ) |>mutate(party =recode(party,"dems"="Democrat","repubs"="Republican"))yearly_totals <- pac_long |>filter(country =="Switzerland", !is.na(amount)) |>group_by(year, party) |>summarise(amount =sum(amount), .groups ="drop")# ==================# Create the Plot# ==================g2a <-ggplot(yearly_totals, aes(x = year, y = amount, color = party)) +geom_line(linewidth =1.2) +scale_y_continuous(name ="Total amount",labels =label_dollar(scale =1e-6, suffix ="M") ) +scale_x_continuous(name ="Year",breaks = scales::pretty_breaks() ) +scale_color_manual(values =c("Democrat"="blue", "Republican"="red"),labels =c("Democrat", "Republican"),name ="Party" ) +labs(title ="Contributions to US political parties from Switzerland-connected PACs") +theme(legend.position.inside =c(0.9, 0.15),axis.title.x =element_text(hjust =0),axis.title.y =element_text(hjust =0) )# - Show the plot# print(g2a)g2a```## 3 - Median housing prices in the US<div style="background-color:#f0f0f0; padding:10px; border-radius:5px; border:1px solid #87CEEB; margin: 20px"><b>Question #3a</b><br><div style="background-color:#e0e0e0; padding:10px; border-radius:5px; border:1px solid #16DB65; display:flex; align-items:center; justify-content:center; height:auto;"> <strong><br>Re-create the graph: `Median Housing Prices in the US - not seasonally adjusted`</strong></div></div><details><summary>Data Analysis - Q3</summary>```{r}#| label: label-me-3_table#| echo: false# Load median housing data - rename the variables as 'date' and 'price'median_housing <-read_csv("data/median-housing.csv") |>rename(date = DATE, price = MSPUS) |>mutate(date = lubridate::ymd(date))# Load recession datarecessions <-read_csv("data/recessions.csv") |>rename(start = Peak, end = Trough) |>mutate(start = lubridate::ymd(start), end = lubridate::ymd(end))# Prepare diagnostics for median housingmh_col_types <-sapply(median_housing, class)mh_missing <-sapply(median_housing, function(x) sum(is.na(x)))mh_price_summary <-summary(median_housing$price)mh_date_range <-range(median_housing$date, na.rm =TRUE)mh_diag <-tibble(Dataset ="median_housing",Metric =c("date - Data Type", "price - Data Type", "date - Missing Values", "price - Missing Values","price - Min", "price - 1st Quartile", "price - Median", "price - Mean","price - 3rd Quartile", "price - Max","date - Range Start", "date - Range End"),Value =c(mh_col_types["date"], mh_col_types["price"], mh_missing["date"], mh_missing["price"], mh_price_summary["Min."], mh_price_summary["1st Qu."], mh_price_summary["Median"], mh_price_summary["Mean"], mh_price_summary["3rd Qu."], mh_price_summary["Max."],as.character(mh_date_range[1]), as.character(mh_date_range[2])))# Prepare diagnostics for recessionsrecess_col_types <-sapply(recessions, class)recess_missing <-sapply(recessions, function(x) sum(is.na(x)))recess_start_range <-range(recessions$start, na.rm =TRUE)recess_end_range <-range(recessions$end, na.rm =TRUE)recess_diag <-tibble(Dataset ="recessions",Metric =c("start - Data Type", "end - Data Type","start - Missing Values", "end - Missing Values","start - Range Start", "start - Range End","end - Range Start", "end - Range End"),Value =c(recess_col_types["start"], recess_col_types["end"], recess_missing["start"], recess_missing["end"],as.character(recess_start_range[1]), as.character(recess_start_range[2]),as.character(recess_end_range[1]), as.character(recess_end_range[2])))# Combine the two diagnosticscombined_diag <-bind_rows(mh_diag, recess_diag)# Print combined diagnostics tablecombined_diag |>kable(caption ="Combined Diagnostic Summary for Median Housing and Recession Data",col.names =c("Dataset", "Metric", "Value"),escape =FALSE ) |>kable_styling(full_width =FALSE, bootstrap_options =c("striped", "hover")) |>row_spec(which(combined_diag$Metric %in%c("date - Missing Values", "price - Missing Values", "start - Missing Values", "end - Missing Values")),background ="#fff9c4") # Highlight missing values rows```</details>```{r}#| label: label-me-3a#| code-fold: true# Load the median housing data# median_housing <- read_csv("data/median-housing.csv") |># rename(date = DATE, price = MSPUS) |># mutate(date = ymd(date)) ## # # - Load the recession data# recessions <- read_csv("data/recessions.csv") |># rename(start = Peak, end = Trough) |># mutate(start = ymd(start), end = ymd(end)) # mutate to y-m-d format# Custom y-axis breaks (extend past 400,000 to ensure it shows)y_breaks <-seq(0, 440000, by =40000)# ADDED: Custom x-axis breaks (exclude endpoints)x_breaks <-seq(1965, 2020, by =5) |>paste0("-01-01") |>ymd()# Create the visualizationg3a <-ggplot(median_housing, aes(x = date, y = price)) +geom_line(color ="#4d72e3") +# - REMOVED: Recession shading (geom_rect layer)# - REPLACED: Restrict x-axis and apply custom breaksscale_x_date(breaks = x_breaks,limits =c(ymd("1963-01-01"), ymd("2021-04-01")), # x-limitsdate_labels ="%Y"# format to '2000' ) +# Custom y-axis breaks and formatted labelsscale_y_continuous(breaks = y_breaks, labels =comma_format(),limits =c(0, 400000), # y-limitsexpand =c(0, 0) ) +# Axis labelslabs(title ="Median sales price of houses sold in the United States\nNot seasonally adjusted",y ="Dollars",x =NULL,# - Source annotationcaption ="Sources: Census; HUD" ) +theme_minimal() +# - Right-align the captiontheme(plot.title =element_text(hjust =0,margin =margin(b=15,l=-50,r=0) # - control margins of plot title. ), # Align leftplot.caption =element_text(hjust =1, size =9, face ="italic"),plot.margin =margin(t =5, r =0, b =10, l =30), # Increase left margin to push plot body rightpanel.grid.major.x =element_blank(), # Remove major vertical grid linespanel.grid.minor.x =element_blank(), # Remove minor vertical grid linespanel.grid.major.y =element_line(), # Keep major horizontal grid linespanel.grid.minor.y =element_blank() # Remove minor horizontal grid lines )plot(g3a)```<div style="background-color:#f0f0f0; padding:10px; border-radius:5px; border:1px solid #87CEEB; margin: 20px"><b>Question #3b</b><br><div style="background-color:#e0e0e0; padding:10px; border-radius:5px; border:1px solid #16DB65; display:flex; align-items:center; justify-content:center; height:auto;"> <strong><br>• Identify recessions that happened during the time frame of the median_housing dataset. Do this by adding a new variable to recessions that takes the value TRUE if the recession happened during this time frame and FALSE if not.<br>• Now recreate the following visualization. The shaded areas are recessions that happened during the time frame of the median_housing dataset. Hint: The shaded areas are "behind" the line. </strong></div></div>```{r}#| label: label-me-3b#| code-fold: true# Load the median housing data - assumed already loaded elsewhere# median_housing <- read_csv("data/median-housing.csv") |> # rename(date = DATE, price = MSPUS) |> # mutate(date = ymd(date)) # Load the recession datarecessions <-read_csv("data/recessions.csv") |>rename(start = Peak, end = Trough) |>mutate(start =ymd(start), end =ymd(end)) # Custom y-axis breaks (extend past 400,000 to ensure it shows)y_breaks <-seq(0, 440000, by =40000)# Custom x-axis breaks (exclude endpoints)x_breaks <-seq(1965, 2020, by =5) |>paste0("-01-01") |>ymd()# Create the visualizationg3b <-ggplot(median_housing, aes(x = date, y = price)) +geom_line(color ="#4d72e3") +# Recession shadinggeom_rect(data = recessions,aes(xmin = start, xmax = end, ymin =-Inf, ymax =Inf),inherit.aes =FALSE,fill ="gray",alpha =0.3 ) +# Restrict x-axis and apply custom breaksscale_x_date(breaks = x_breaks,limits =c(ymd("1961-01-01"), ymd("2025-01-01")),date_labels ="%Y" ) +# Custom y-axis breaks and formatted labelsscale_y_continuous(breaks = y_breaks,labels =comma_format(),limits =c(0, 400000),expand =c(0, 0) ) +# Axis labelslabs(title ="Median sales price of houses sold in the United States\nNot seasonally adjusted",y ="Dollars",x ="Shaded areas indicate U.S. recessions",caption ="Source: Census; HUD" ) +theme_minimal() +# Customize theme for captions, margins, and gridstheme(plot.caption =element_text(hjust =1, size =9, face ="italic"),plot.title =element_text(hjust =0,margin =margin(b =15, l =-50, r =0) ),plot.margin =margin(t =5, r =0, b =10, l =30),axis.title.x =element_text(hjust =1, # right-align the x-axis labelmargin =margin(t =10) # optional: add some space above the label ),# Grid linespanel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank(),panel.grid.major.y =element_line(),panel.grid.minor.y =element_blank() )# Print plot without warningssuppressWarnings(print(g3b))```::: {.alert-info style="background-color:#FFA07A; color:black; padding:10px; border-radius:5px; display: flex; align-items: center; min-height: 60px;"}**3b-Note:** Some recession rows were intentionally excluded for the purpose of the assignment.:::<div style="background-color:#f0f0f0; padding:10px; border-radius:5px; border:1px solid #87CEEB; margin: 20px"><b>Question #3c</b><br><div style="background-color:#e0e0e0; padding:10px; border-radius:5px; border:1px solid #16DB65; display:flex; align-items:center; justify-content:center; height:auto;"> <strong><br>• Create a subset of `median_housing` dataset from 2019 and 2020. Add two columns: `year` and `quarter`. `year` is the year of the `date` and the `quarter` takes the values of Q1, Q2, Q3, or Q4 based on `date`<br>• Re-create the visualization. </strong></div></div>```{r}#| label: label-me-3c#| code-fold: true# - Load the data (already loadedNo)# median_housing <- read_csv("data/median-housing.csv") |># rename(date = DATE, price = MSPUS) |># mutate(date = ymd(date))# - Filter to 2019 and 2020 only, keeping quarterly pointsquarter_dates <-seq(ymd("2019-01-01"), ymd("2020-10-01"), by ="3 months")median_housing_q <- median_housing |>filter(date %in% quarter_dates)# - Define quarter labelsquarter_labels <-rep(c("Q1", "Q2", "Q3", "Q4"), times =2)# - Define year sub-labels (x-axis) - best I could figure out how to doyear_labels <-c("", sprintf("%25s", "2019"), "", "","", sprintf("%25s", "2020"), "", "")# - Combine into single x-axis label with optional sub-labelscombined_labels <-ifelse( year_labels =="", quarter_labels,paste0(quarter_labels, "\n", year_labels))#===============# - Plot data#===============g3c <-ggplot(median_housing_q, aes(x = date, y = price)) +# --- Change 1 & 2: Thicken the line and make sure it does not run through points ---geom_line(color ="#4d72e3", linewidth =1, lineend ="round") +# Slightly thicker line# --- Change 1: Make circles look 'empty' by drawing points with stroke and fill white ---geom_point(shape =21, size =2, color ="#4d72e3", fill ="white", stroke =1.2) +# Empty circles# --- Change 3: Limit line to end exactly at Q4 by limiting x-axis ---scale_x_date(breaks = quarter_dates,labels = combined_labels,limits =c(ymd("2019-01-01"), ymd("2020-10-01")), # Already correct rangeexpand =c(0.02, 0.02) # <<< CHANGE: Prevent extension beyond last Q4 tick) +scale_y_continuous(breaks =seq(300000, 360000, by =20000),labels =comma_format(),limits =c(300000, 360000),expand =c(0, 0) ) +labs(title ="Median sales price of houses sold in the United States\nNot seasonally adjusted",x =NULL,y ="Dollars" ) +theme_minimal() +theme(plot.title =element_text(hjust =0,margin =margin(b=15,l=-50,r=0) # Control margins of plot title. ), # Align leftplot.caption =element_text(hjust =1, size =9, face ="italic"),plot.margin =margin(t =5, r =0, b =40, l =30), # Increase left margin to push plot body right#panel.grid.major.x = element_blank(), # Remove major vertical grid linespanel.grid.minor.x =element_blank(), # Remove minor vertical grid linespanel.grid.major.y =element_line(), # Keep major horizontal grid lines#panel.grid.minor.y = element_blank(), # Remove minor horizontal grid lines ) +# --- CHANGE: Use annotation_custom to add year labels below x-axis ---# Add year labels between Q2 and Q3 using annotate()annotate("text", x =ymd("2019-05-16"), y =298000, label ="2019", size =3.5) +annotate("text", x =ymd("2020-05-16"), y =298000, label ="2020", size =3.5)# - PlotsuppressWarnings(print(g3c))```::: {.alert-info style="background-color:#FFA07A; color:black; padding:10px; border-radius:5px; display: flex; align-items: center; min-height: 60px;"}**3c-Note:** Some recession rows were intentionally excluded for the purpose of the assignment.:::## 4 - Expect More. Plot More.<div style="background-color:#f0f0f0; padding:10px; border-radius:5px; border:1px solid #87CEEB; height:auto"><b>Question #4</b><br><div style="background-color:#e0e0e0; padding:10px; border-radius:5px; border:1px solid #16DB65; display:flex; align-items:center; justify-content:center; height:auto;"> <strong><br>Recreate the Target LOGO.<br>Describe steps..<br>(see code comments) <br>1. make inner dot<br>2. Make outer ring <br>3. Make 'Target' use '[R]' symbol<br>4. Piece it all together.</strong></div></div>```{r}#| label: label-me-4#| code-fold: true# library(ggplot2)# library(ggforce)# Define Target's signature red colortarget_red <-"#e82118"# Create a data frame for the concentric circles (outer red ring and inner white ring)circles <-data.frame(x0 =0, y0 =0, # Center of both circlesr =c(1, 0.7), # Radii for the outer red ring and inner white ringfill =c(target_red, "white") # Fill colors for each ring)# Construct the plotg1 <-ggplot() +# Draw the red and white rings (two concentric circles)geom_circle(data = circles, aes(x0 = x0, y0 = y0, r = r, fill = fill), color =NA) +# Draw the solid inner red circle (target bullseye center)geom_circle(aes(x0 =0, y0 =0, r =0.3), fill = target_red, color =NA) +# Add the text "TARGET" beneath the bullseyeannotate("text", x =0, y =-1.4, label ="TARGET", color = target_red, size =10, fontface ="bold") +# Add the registered trademark symbol next to "TARGET"annotate("text", x =0.6, y =-1.52, label ="®", color = target_red, size =8) +# Use fill colors as provided (don't map them to a color scale)scale_fill_identity() +# Ensure equal aspect ratio for x and y (perfect circles)coord_fixed() +# Remove all background, axes, and gridlinestheme_void()# Render the plotplot(g1)```## 5 - Mirror, mirror on the wall, who’s the ugliest of them all?<div style="background-color:#f0f0f0; padding:10px; border-radius:5px; border:1px solid #87CEEB; margin:20px"><b>Question #5</b><br><div style="background-color:#e0e0e0; padding:10px; border-radius:5px; border:1px solid #16DB65; display:flex; align-items:center; justify-content:center; height:auto;"> <strong><br>Mirror, mirror on the wall, who’s the ugliest of them all? Make a plot of the variables in the penguins dataset from the palmerpenguins package. Your plot should use at least two variables, but more is fine too. First, make the plot using the default theme and color scales. Then, update the plot to be as ugly as possible. You will probably want to play around with theme options, colors, fonts, etc. The ultimate goal is the ugliest possible plot, and the sky is the limit!</strong></div></div>```{r}#| label: label-me-5a#| code-fold: true# Load necessary libraries# Remove rows with missing valuespenguins_clean <-na.omit(penguins)# Create scatter plot: bill length vs flipper length, colored by speciesg5a <-ggplot(data = penguins_clean, aes(x = bill_length_mm, y = flipper_length_mm, color = species)) +geom_point() +labs(title ="Plot #1 - normal plot\nBill Length vs Flipper Length by Species",x ="Bill Length (mm)",y ="Flipper Length (mm)" )# - show itg5a```<div style="background-color:#f0f0f0; padding:10px; border-radius:5px; border:1px solid #87CEEB; margin:60px"><b>Question #5</b><br><div style="background-color:#e0e0e0; padding:10px; border-radius:5px; border:1px solid #16DB65; display:flex; align-items:center; justify-content:center; height:auto;"> <strong><br>The ultimate goal is the ugliest possible plot, and the sky is the limit!<br> Snakes on a plane? No. Penguins on a Sphere!</strong></div></div>```{r}#| code-fold: true#| label: label-me-5b# Step 1: Clean the datapenguins_clean <-na.omit(penguins)# Step 2: Map to spherical coordinates# - θ (theta): polar angle (bill length) mapped to [0, π]# - φ (phi): azimuthal angle (flipper length) mapped to [0, 2π]penguins_sphere <- penguins_clean %>%mutate(theta =rescale(bill_length_mm, to =c(0, pi)),phi =rescale(flipper_length_mm, to =c(0, 2* pi)),r =1,x = r *sin(theta) *cos(phi),y = r *sin(theta) *sin(phi),z = r *cos(theta) )# Step 3: Build the 3D scatter plotplot_3d <-plot_ly(data = penguins_sphere,x =~x, y =~y, z =~z,type ='scatter3d',mode ='markers',color =~species,colors =c("Adelie"="red", "Chinstrap"="green", "Gentoo"="blue"),marker =list(size =4)) %>%layout(title =list(text ="Plot #2 – Penguins on a Sphere",font =list(size =25) ),margin =list(t =100, # ← increase top margin herer =30 ),scene =list(xaxis =list(title ="X"),yaxis =list(title ="Y"),zaxis =list(title ="Z") ),legend =list(title =list(text ="<b>Species</b>")) )# Step 4: Add description as an HTML caption below the plottagList( plot_3d,HTML(" <div style='margin-top: 1em; font-size: 14px; line-height: 1.4em;'> <strong>Mapping Description:</strong><br> • <b>Bill Length</b> → θ (polar angle, latitude)<br> • <b>Flipper Length</b> → φ (azimuthal angle, longitude)<br> • Radius is constant: <b>r = 1</b><br> • Penguins are plotted on the <b>surface of a unit sphere</b> </div> "))```